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Abstract 

The entropy of a binary symmetric Hidden Markov Process is calculated as an expansion 
in the noise parameter e. We map the problem onto a one-dimensional Ising model in a large 
field of random signs and calculate the expansion coefficients up to second order in e. Using a 
conjecture we extend the calculation to 11th order and discuss the convergence of the resulting 
series. 
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1 Introduction 



Hidden Markov Processes (HMPs) have many applications, in a wide range of disciplines - from 
the theory of communication [1] to analysis of gene expression [2] . Comprehensive reviews on both 
theory and applications of HMPs can be found in ([1], [3]). Recent applications to experimental 
physics are in ([4], [5]). The most widely used context of HMPs is, however, that of construction of 
reliable and efficient communication channels. 

In a practical communication channel the aim is to reliably transmit source message over a 
noisy channel. Fig 1. shows a schematic representation of such a communication. The source 
message can be a stream of words taken from a text. It is clear that such a stream of words 
contains information, indicating that words and letters are not chosen randomly. Rather, the 
probability that a particular word (or letter) appears at a given point in the stream depends on 
the words (letters) that were previously transmitted. Such dependency of a transmitted symbol on 
the precedent stream is modelled by a Markov process. 

The Markov model is a finite state machine that changes state once every time unit. The 
manner in which the state transitions occur is probabilistic and is governed by a state-transition 
matrix, P, that generates the new state of the system. The Markovian assumption indicates that 
the state at any given time depends only on the state at the previous time step. When dealing 
with text, a state usually represents either a letter, a word or a finite sequence of words, and the 
state-transition matrix represents the probability that a given state is followed by another state. 
Estimating the state-transition matrix is in the realm of linguistics; it is done by measuring the 
probability of occurrence of pairs of successive letters in a large corpus. 

One should bear in mind that the Markovian assumption is very restrictive and very few physical 
systems can expect to satisfy it in a strict manner. Clearly, a Markov process imitates some 
statistical properties of a given language, but can generate a chain of letters that is grammatically 
erroneous and lack logical meaning. Even though the Markovian description represents only some 
limited subset of the correlations that govern a complex process, it is the simplest natural starting 
point for analysis. Thus one assumes that the original message, represented by a sequence of N 
binary bits, has been generated by some Markov process. In the simplest scenario, of a binary 
symmetric Markov process, the underlying Markov model is characterized by a single parameter - 
the flipping rate p, denoting the probability that a is followed by 1 (the same as a 1 followed by 
a 0). The stream of N bits is transmitted through a noisy communication channel. The received 
string differs from the transmitted one due to the noise. The simplest way to model the noise is 
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known as the Binary Symmetric Channel, where each bit of the original message is flipped with 
probability e. Since the observer sees only the received, noise-corrupted version of the message, 
and neither the original message nor the value of p that generated it are known to him, what he 
records is the outcome of a Hidden Markov Process. Thus, HMPs are double embedded stochastic 
processes; the first is the Markov process that generated the original message and the second, which 
does not influence the Markov process, is the noise added to the Markov chain after it has been 
generated. 

Efficient information transmission plays a central role in modern society, and takes a variety 
of forms, from telephone and satellite communication to storing and retrieving information on 
disk drives. Two central aspects of this technology are error correction and compression. For both 
problem areas it is of central importance to estimate Or, the number of (expected) received signals. 

In the noise free case this equals the expected number of transmitted signals 0$; when the 
Markov process has flipping rate p = 0, only two strings (all 1 or all 0) will be generated and 
Os = 2, while when the flip rate is p = 1/2 each string is equally likely and Qs = 2 N ■ 

In general, Or is given, for large N, by 2 NH , where H = H(p, e) is the entropy of the process. 
The importance of knowing Or for compression is evident: one can number the possible messages 
i = 1,2, ...il^, and if Or < 2^, by transmitting only the index of the message (which can be 
represented by log 2 fin < N bits) we compress the information. Note that we can get further 
compression using the fact that the Or messages do not have equal probabilities. 

Error correcting codes are commonly used in methods of information transmission to com- 
pensate for noise corruption of the data during transmission. These methods require the use of 
additional transmitted information, i.e., redundancy, together with the data itself. That is, one 
transmits a string of M > N bits; the percentage of additional transmitted bits required to recover 
the source message determines the coding efficiency, or channel capacity, a concept introduced and 
formulated by Shannon. The channel capacity for the BSC and for a random i.i.d. source was 
explicitly derived by Shannon in his seminal paper of 1948 [6]. The calculation of channel capacity 
for a Markovian source transmitted over a noisy channel is still an open question. 

Hence, calculating the entropy of a HMP is an important ingredient of progress towards deriving 
improved estimates of both compression and channel capacity, of both theoretical and practical 
importance for modern communication. In this paper we calculate the entropy of a HMP as a 
power series in the noise parameter e. 

In Section 2 we map the problem onto that of a one-dimensional nearest neighbor Ising model 
in a field of fixed magnitude and random signs (see [7] for a review on the Random Field Ising 
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Model). Expansion in e corresponds to working near the infinite field limit. 

Note that the object we are calculating is not the entropy of an Ising chain in a quenched 
random field, as shown in eq. (17) and in the discussion following it. In technical terms, here 
we set the replica index to n = 1 after the calculation, whereas to obtain the (quenched average) 
properties of an Ising chain one works in the n — > limit. 

In Sec. 3 we present exact results for the expansion coefficients of the entropy up to second 
order. While the zeroth and first order terms were previously known ([8], [9]), the second order 
term was not [10] . In Sec 4. we introduce bounds on the entropy that were derived by Cover 
and Thomas [11]; we have strong evidence that these bounds actually provide the exact expansion 
coefficients. Since we have not proved this statement, it is presented as a conjecture; on it's basis 
the expansion coefficients up to eleventh order are derived and listed. We conclude in Sec. 5 by 
studying the radius of convergence of the low-noise expansion, and summarize our results in Sec 6. 

2 A Hidden Markov Process and the Random-Field Ising Model 

2.1 Defining the process and its entropy 

Consider the case of a binary signal generated by the source. Binary valued symbols, = ±1 are 
generated and transmitted at fixed times iAt. Denote a sequence of N transmitted symbols by 

S = (si,s 2 ,....s N ) (1) 

The sequence is generated by a Markov process; here we assume that the value of Sj+i depends 
only on (and not on the symbols generated at previous times). The process is parametrized by 
a transition matrix P, whose elements are the transition probabilities 

P+,_ = Pr(s i+1 = +l\ Si = -1) P_,+ = Pr(s i+1 = -l\ Si = +1) (2) 

Here we treat the case of a symmetric process, i.e. P+- = P-,+ = P, so that we have 

/ Si prob. = l-p 

S i+1 = i u ( 3 ) 

I — Si prob. = p 

The first symbol s\ takes the values ±1 with equal probabilities, Pr(s\ = +1) = Pr(s\ = —1) = 
1/2. The probability of realizing a particular sequence S is given by 

1 N 

Pr(S) = -I[Pr(si\s i - 1 ) (4) 

Z i=2 

The generated sequence S is " senf and passes through a noisy channel; hence the received sequence, 

R = (n,r 2 , ...r N ) (5) 
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is not identical to the transmitted one. The noise can flip a transmitted symbol with probability e: 

Pr(ri = -Si\si) = e (6) 

Here we assumed that the noise is generated by an independent identically distributed (iid) process; 
the probability of a flip at time i is independent of what happened at other times j < i and of the 
value of i. We also assume that the noise is symmetric, i.e. the flip probability does not depend on 

Si. 

Once the underlying Markov process S has been generated, the probability of observing a 
particular sequence R is given by 

N 

Pr(R\S) = l[Pr(r i \s i ) (7) 

i=i 

and the joint probability of any particular S and R to occur is given by 

Pr(R, S) = Pr(R\S)Pr(S) (8) 

The original transmitted signal, S, is "hidden" and only the received (and typically corrupted) 
signal R is "seen" by the observer. Hence it is meaningful to ask - what is the probability to 
observe any particular received signal Rl The answer is 

Q(R) = J2Pr(R,S) (9) 

s 

Furthermore, one is interested in the 1 Shannon entropy H of the observed process, 

H N = -^2Q(R)logQ(R) (10) 

R 

and in particular, in the entropy rate, defined as 

H= lim — (11) 
2.2 Casting the problem in Ising form 

It is straightforward to cast the calculation of the entropy rate onto the form of a one-dimensional 
Ising model. The conditional Markov probabilities (3), that connect the symbols from one site to 
the next, can be rewritten as 

Pr(s i+1 \ Si ) = e JSl + lSl j{e J + e~ J ) with e 2 J = (1 - p) /p (12) 



lr The Shannon entropy is defined using log2; we use natural log for simplicity 
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and similarly, the flip probability generated by the noise, (6), is also recapitulated by the Ising form 

Pr(n\8i) = e Kr ^/{e K + e~ K ) with e 2K = (1 - e)/e (13) 

The joint probability of realizing a pair of transmitted and observed sequences (S, R) takes the 
form ([12], [13]) 

/ N-l N \ 

Pr{R, S) = A exp J £ s l+1 s t + (14) 

V i=l i=l / 

where the constant ^4 is the product of two factors, A = AqA\, given by 

A^^ + e^f^ ^(e^e"*)"" (15) 

The first sum in (14) is the Hamiltonian of a chain of Ising spins with open boundary conditions 

and nearest neighbor interactions J; the interactions are ferromagnetic (J > 0) for p < 1/2. The 

second term corresponds, for small noise e < 1/2, to a strong ferromagnetic interaction K between 

each spin and another spin, rj, connected to Si by a "dangling bond" (see Fig 2). 

Denote the summation over the hidden variables by Z(R): 

/ N-l N \ 

Z(R) = J2 ex P [ J Si + lSi + K J2 riSi ) ( 16 ) 

s \ i=l i=l / 

so that the probability Q(i2) becomes (see eq. (9)) Q(R) = A Z(R). Substituting in (10), the 
entropy of the process can be written as 

H N = - ]T A Z(R) \og[AZ(R)} = - 

R 

The interpretation of this expression is obvious: an Ising chain is submitted to local fields 
hi = Kri, with the sign of the field at each site being ± with equal probabilities, and we have to 
average Z(h\, .../ijv)™ over the field configurations. This is precisely the problem one faces in order 
to calculate properties of a well-studied model, of a nearest neighbor Ising chain in a quenched 
random field of uniform strength and random signs at the different sites (there one is interested, 
however, in the limit n —> 0). This problem has not been solved analytically, albeit a few exactly 
solvable simplified versions of the model do exist ([14], [15], [16], [17]), as well as expansions (albeit 
in the weak field limit [18]). 

One should note that here we calculate the entropy associated with the observed variables R . 
In the Ising language this corresponds to an entropy associated with the randomly assigned signs 
of the local fields, and not to the entropy of the spins S. Because of this distinction the entropy 
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Z(R)T 



R 



(17) 



. n — 1 
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Hjsf has no obvious physical interpretation or relevance, which explains why the problem has not 
been addressed yet by the physics community. 

We are interested in calculating the entropy rate in the limit of small noise, i.e. e <C 1. In the 
Ising representation this limit corresponds to K > 1 and hence an expansion in e corresponds to 
expanding near the infinite field limit of the Ising chain. 

3 Expansion to order e 2 : exact results 

We are interested in calculating the entropy rate 

H = — lim 

to a given order in e. A few technical points are in order. First, we will actually use 

e - 2K = e /(l - e) (19) 

as our small parameter and expand to order e 2 afterwards. Second, we will calculate Hjy and 
take the large N limit. Therefore we can replace the open boundary conditions with periodic ones 
(setting sn+i = si) - the difference is a surface effect of order 1/N. The constant Aq becomes 

A = (e J + e- J )~ N (20) 

and the interaction term Js\sn is added to the first sum in eq. (14), which contains now N pairs 
of neighbors. 

Expanding Z(R): Consider Z(R) from (16). For any fixed R = (r\,r2, ■■■vn) the leading order 
is obtained by the S configuration with Sj = for all i. For this configuration each site contributes 
K to the "field term" in (16). The contribution of this configuration to the summation over S in 
(16) is 




The next term we add consists of the contributions of those S configurations which have Sj = 
at all but one position. The field term of such a configuration is K from N — 1 sites and — K from 
the single site with Sj = —rj. There are N such configurations, and the total contribution of these 
terms to the sum (16) is 

/ N \ N 

Z(R)W = e NK e- 2K exp J^n+m 1 £exp[-2Jr j (r j - 1 + r j+1 )] (22) 

V 1=1 ) j=i 



-J2 AZ (R) l ^AZ(R) 



(18) 
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The next term is of the highest order studied in this paper; it involves configurations S with all but 
two spins in the state Sj = the other two take the values Sj = —rj, s k = — r k , i.e. are flipped 
with respect to the corresponding local fields. These S configurations belong to one of two classes. 
In class a the two flipped spins are located on nearest neighbor sites, e.g. k = j + 1; there are N 
such configurations. To the second class, b, belong those configurations in which the two flipped 
spins are not neighbors - there are N(N — 3)/2 such terms in the sum (16), and the respective 
contributions are 2 

/ N \ N 

Z{R)^ a ) =e NK e- AK exp J^r m r, ^exp^J^-i + r j+ ir j+2 )] (23) 

V i=i / j=i 

/ N \ 1 N 

Z(R)W = e NK e- 4K exp J^+m -£ E exp[-2Jr J -(r J -_i + r j+1 ) - 2Jr k (r k _ x + r k+1 )] 

\ i=i / j=ik^j,j±i 

(24) 

Calculation of H is now straightforward, albeit tedious: substitute AZ into eq. (18), expand 
everything in powers of e, to second order, and for each term perform the summation over all the 
j-j variables. These summations involve two kinds of terms. The first is of the " partition-sum- like" 
form 

J2e n{R) where H{R) =J2 A j Jr j r j+i with A i = ±x ( 25 ) 
R j 

For the case studied here we encounter either all bonds AjJ > 0, or two have a flipped sign 
(corresponding to eq. (22, 23)), or four have flipped signs (corresponding to (24)). These "partition- 
sum- like" terms are independent of the signs of the Aj ; in fact we have for all of them 

4,5>"W> = 1 (26) 
R 

The second type of term that contributes to H is of the " energy- like" form: 

Y^e n ^r k r k+1 (27) 
R 

The absolute value of these terms is again independent of the Aj, but one has to keep track of 
their signs. Finally, one has to remember that the constant A\ also has to be expanded in e. The 
calculation finally yields the following result (here we switch from J to the "natural" variable p 
using eq. (12)): 

oo 

H(p,e) = Y / H^(p)e k (28) 

fc=0 

2 We use the obvious identifications imposed by periodic boundary conditions, e.g. rjv+i = ri, tn+2 = r2 
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with the coefficients given by 



— _p \ Q gp — (i _ log(l — p) 



= 2(1 -2p) log 



1 — p 



P 



(29) 



#( 2 ) = -2(1 - 2p) log 



1 — p 



(1 ~ 2p) 2 
2p 2 (l -p) 2 



(30) 



The zeroth and first order terms (29) were known ([8], [9]), while the second order term is new [10]. 

4 Upper Bounds derived using a system of finite length 

When investigating the limit H, it is useful to study the quantity Cn = Hn — fljv-i, which is also 
known as the conditional entropy. Cn can be interpreted as the average amount of uncertainty we 
have on rjv, assuming that we know (n, . . . , rjv-i). Provided that H exist, it easily follows that 



H= lim C N 

N^oo 

Moreover, according to [11], CV > H, and the convergence is monotone : 



(31) 



C N \H (N^oo) 



(32) 



We can express CV as a function of p and e by using eq. (17). For this, we represent Z using the 
original variables p, e (note that from this point of we work with open boundary conditions on the 
Ising chain of TV spins): 



Z(R) = - p^ti 1 1 s i =s i+lp ^-i-E^i 1 is^+i (i _ e )Eti 1 s i =fl ie ^-Er=i 1 s i =R i (33) 

where we denote 1 S)S / = (1 + ss')/2. Eq. (33) gives Z(R) explicitly as a polynomial in p and e with 
maximal degree N, and can be represented as : 



N 



Z(R)=Y J Z i {R)e i 

i=0 



(34) 



Here Zj = Z{(R) are functions of p only. 

Substituting this expansion in eq. (17), and expanding log Z(R) according to the Taylor series 
log(o + x) = log(a) - J2n=i , we get 



if 



A? 



E 



AT 



i=0 



log Z (R)-J2 

3=1 



jZ (R)i 



+ 0(e 



(35) 
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When extended to terms of order e , this equation gives us precisely the expansion of the upper- 
bound Cn up to the k-th order, 

k 

C N = Y.C { St l + 0(e k+l ) (36) 

i=0 

For example, stopping at order L = 2 gives 

^ = -E{ Z o(^)log^o(«)+[^i(^)(l + log^o(^))] e+ 

e 2 j+0(e 3 ) (37) 

The zeroth and first order terms can be evaluated analytically for any N; beyond first order, we can 
compute the expansion of Hn symbolically 3 (using Maple [19]), for any finite N . This was actually 
done, for N < 8 and k < 11. For the first order we have proved ([10]) that C$ is independent of N 
(and equals H^). The symbolic computation of higher order terms yielded similar independence 
of TV, provided that N is large enough. So, cffl = for large enough N. For example, C$ is 
independent of N for 3 < N < 8 and equals the exact value of as given by eq. (30). Similarly, 
C$ settles, for N > 4, at some value denote by C^ A \ and so on. For the values we have checked, 
the settling point for C$ turned out to be at N = [^j 3 -]. This behavior is, however, unproved for 
k > 2, and, therefore, we refer to it 

Conjecture: For any order k, there is a critical chain length N c (k) = f^ 3 -] such that for 
N > N c (k) we have C$ = C( fc ). 

It is known that — > H, and and H are analytic functions of e at e = 4 , so that we 
can expand both sides around e = 0, and conclude that cffi — > for any k > 1 when N — > oo. 
Therefore, if our conjecture is true, and indeed settles at some value independent of N 
(for N > N c (k) ), it immediately follows that this value equals H^ k \ Note that the settling is 
rigourously supported for k = 0, 1, while for k = 2 we showed that indeed C (2) = # (2) , supporting 
our conjecture. 

The first orders up to H^ ll \ obtained by identifying with C^ k \ are given in the Appendix, as 
functions of A = 1 — 2p, for better readability. The values of H^°\H^ and coincide with the 
results that were derived rigorously from the low-temperature/high- field expansion, thus giving us 

support for postulating the above Conjecture. 

3 The computation we have done is exponential in N, but the complexity can be improved. 
4 See next section on the Radius of Convergence 



^f+Z 2 (R)(l + logZ (R)) 
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Interestingly, the nominators have a simpler expression when considered as a functions of A, which 
is the second eigenvalue of the Markov transition matrix P. Note that only even powers of A appear. 
Another interesting observation is that the free element in \p{\ — p)] 2 ( fc_1 )i/( fc ) (when treated as a 
polynomial in p), is ^-rjy, which might suggest some role for the function log(l + ^pli-p)]' 2 ) * n ^ e 
first derivative of H. All of the above observations led us to conjecture the following form for 
(for k > 3) : 

2 4(fc-l)£d* X 2j 
k(k-l){\-\*)W-V 1 j 

where and dk are integers that can be seen in the Appendix for up to k = 11. 

5 The Radius of Convergence 

If one wants to use our expansion around e = for actually estimating H at some value e, it is 
important to ascertain that e lies within the radius of convergence of the expansion. The funda- 
mental observation made here is that for p = 0, the function H{e) is not an analytic function at 
e = 0, since its first derivative diverges. As we increase p, the singularity points 'moves' to negative 
values of e, and hence the function is analytic at e = 0, but the radius of convergence is determined 
by the distance of e = from this singularity. Denote by p(p) the radius of convergence of H(e) for 
a given p; we expect that p(p) grows when we increase p, while for p — > 0, p(p) — > 0. 

It is useful to first look at a simpler model, in which there is no interaction between the spins. 
Instead, each spin is in an external field which has a uniform constant component J, and a site- 
dependent component of absolute value K and a random sign. For this simple i.i.d. model the 
entropy rate takes the form 

H = h b [p{\ - e) + e{\ - p)\ (39) 

where hb[x] = — [x log x + (1 — x) log(l — x)\ is the binary entropy function. Note that for e = the 
entropy of this model equals that of the Ising chain. Expanding eq. (39) in e (for p > 0) gives : 

1 — p 

H = -(p\ogp+ (1 -p)log(l -p)) + (1 - 2p)log( - )e+ 



(2p - l) k (1 - 2p) k 



+ 



e k (40) 



pk (1 _ 1 

The radius of convergence here is easily shown to be p/(l — 2p); it goes to for p — > and increases 
monotonically with p. 

Returning to the HMP, the orders are (in absolute value) usually larger than those of the 
simpler i.i.d. model, and hence the radius of convergence may be expected to be smaller. Since we 
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could not derive p(p) analytically, we estimated it using extrapolation based on the first 11 orders. 
We use the fact that p(p) = lim^oo Jf^^ (provided the limit exists). The data was fitted to a 
rational function of the following form (which holds for the i.i.d. model): 

g (fc) ak + b 
H (k+i) ~ k + c ' ^1) 

For a given fit, the radius of convergence was simply estimated by a. The resulting prediction is 
given in Fig. 3 for both the i.i.d. model (for which it is compared to the known exact p(p)) and 
for the HMP. While quantitatively, the predicted radius of the HMP is much smaller than this of 
the i.i.d. model, it has the same qualitative behavior, of starting at zero for p = 0, and increasing 
with p. 

We compared the analytic expansion to estimates of the entropy rate based on the lower and 
upper bounds, for two values of e (see Fig. 4) . First we took e = 0.01, which is realistic in 
typical communication applications. For p less than about 0.1 this value of e exceeds the radius of 
convergence and the series expansion diverges, whereas for larger p the series converges and gives a 
very good approximation to H(p, e = 0.01). The second value used was e = 0.2; here the divergence 
happens for p < 0.37, so the expansion yields a good approximation for a much smaller range. We 
note that, as expected, the approximation is much closer to the upper bound than to the lower 
bound of [11]. 

6 Summary 

Transmission of a binary message through a noisy channel is modelled by a Hidden Markov Process. 
We mapped the binary symmetric HMP onto an Ising chain in a random external field in thermal 
equilibrium. Using a low-temperature/high-random- field expansion we calculated the entropy of 
the HMP to second order k = 2 in the noise parameter e. We have shown for k < 11 that when 
the known upper bound on the entropy rate is expanded in e, using finite chains of length N, the 
expansion coefficients settle, for N c (k) < N < 8, to values that are independent of N. Posing a 
conjecture, that this continues to hold for any N, we identified the expansion coefficients of the 
entropy up to order 11. The radius of convergence of the resulting series was studied and the 
expansion was compared to the the known upper and lower bounds. 

By using methods of Statistical Physics we were able to address a problem of considerable 
current interest in the problem area of noisy communication channels and data compression. 



12 



Acknowledgments 



I.K. thanks N. Merhav for very helpful comments, and the Einstein Center for Theoretical Physics 
for partial support. This work was partially supported by grants from the Minerva Foundation 
and by the European Community's Human Potential Programme under contract HPRN-CT-2002- 
00319, STIPCO. 

References 

[1] Y. Ephraim and N. Merhav, "Hidden Markov processes", IEEE Trans. Inform. Theory, vol. 
48, p. 1518-1569, June 2002. 

[2] A. Schliep, A. Schonhuth and C. Steinhoff, "Using hidden Markov models to analyze gene 
expression time course data", Bioinformatics, 19, Suppl. 1 p. i255-i263, 2003. 

[3] L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech 
recognition", Proc. IEEE, vol. 77, p. 257286, Feb 1989. 

[4] I. Kanter, A. Frydman and A. Ater, "Is a multiple excitation of a single atom equivalent to a 
single ensemble of atoms?" Europhys. Lett., 2005 (in press). 

[5] I. Kanter, A. Frydman and A. Ater, "Utilizing hidden Markov processes as a new tool for 
experimental physics", Europhys. Lett., 2005 (in press). 

[6] C. E. Shannon, "A mathematical theory of communication", Bell System Technical Journal, 
27, p. 379-423 and 623-656, Jul and Oct, 1948. 

[7] T. Nattermann, "Theory of the Random Field Ising Model", in Spin Glasses and Random 
Fields, ed. by A.P Young, World Scientific 1997. 

[8] P. Jacquet, G. Seroussi and W. Szpankowski, "On the Entropy of a Hidden Markov Process", 
Data Compression Conference , Snowbird, 2004. 

[9] E. Ordentlich and T. Weissman, "New Bounds on the Entropy Rate of Hidden Markov Pro- 
cesses" , San Antonio Information Theory Workshop, Oct 2004. 

[10] A preliminary presentation of our results is given in O. Zuk, I. Kanter and E. Domany, " Asymp- 
totics of the Entropy Rate for a Hidden Markov Process", Data Compression Conference , 
Snowbird, 2005. 

[11] T. M. Cover and J. A. Thomas, "Elements of Information Theory", Wiley, New York, 1991. 

[12] L. K. Saul and M. I. Jordan, "Boltzmann chains and hidden Markov models", Advances in 
Neural Information Processing Systems 7, MIT Press, 1994. 

[13] D. J.C. MacKay, "Equivalence of Boltzmann Chains and Hidden Markov Models", Neural 
Computation, vol. 8 (1), p. 178-181, Jan 1996. 



13 



[14] B. Derrida, M.M. France and J. Peyriere, "Exactly Solvable One-Simensional Inhomogeneous 
Models", Journal of Stat. Phys. 45 (3-4), p. 439-449, Nov 1986. 

[15] D.S. Fisher , P. Le Doussal and P. Monthus, " Nonequilibrium dynamics of random field Ising 
spin chains: Exact results via real space renormalization group" Phys. Rev. E, 64 (6), p. 
066-107, Dec 2001. 

[16] G. Grinstein and D. Mukamel, "Exact Solution of a One Dimensional Ising- Model in a Random 
Magnetic Field", Phys. Rev. B, 27, p. 4503-4506, 1983. 

[17] T. M. Nieuwenhuizen and J.M. Luck, "Exactly soluble random field Ising models in one di- 
mension", J. Phys. A: Math. Gen., 19 p. 1207-1227, May 1986. 

[18] B. Derrida and H. J. Hilhorst, J. Phys. A 16 2641 (1983) 

[19] http://www.maplesoft.com/ 

Appendix 

Orders three to eleven, as function of A = 1 — 2p. (Orders — 2 are given in equations (29 - 30)) : 



#(3) 



-16(5A 4 -10A 2 -3)A 2 
3(1 - A 2 ) 4 



(4) _ 8(109A 8 + 20A 6 - 114A 4 - 140A 2 - 3)A 2 
~ 3(1 - A 2 ) 6 

(5) _ -128(95A 10 + 336A 8 + 762A 6 - 708A 4 - 769A 2 - 100)A 4 
~ 15(1 - A 2 ) 8 



= 128(125A 14 - 321A 12 + 9525A 10 + 16511A 8 - 7825A 6 - 
17995A 4 - 4001A 2 - 115)A 4 /15(1 - A 2 ) 10 

H (7 1 = -256(280A 18 - 45941A 16 - 110888A 14 + 666580A 12 + 1628568A 10 - 
270014A 8 - 1470296A 6 - 524588A 4 - 37296A 2 - 245)A 4 /105(1 - A 2 ) 12 

#( 8 ) = 64(56A 22 - 169169A 20 - 2072958A 18 - 5222301A 16 + 12116328A 14 + 



14 



35666574A 12 + 3658284A 10 - 29072946A 8 - 14556080A 6 - 
1872317A 4 - 48286A 2 - 49)A 4 /21(1 - A 2 ) 14 

= 2048(37527A 22 + 968829A 20 + 8819501 A 18 + 20135431 A 16 - 23482698A 14 - 
97554574A 12 - 30319318A 10 + 67137630A 8 + 46641379A 6 + 8950625A 4 + 
495993A 2 + 4683)A 6 /63(1 - A 2 ) 16 

#( 10 ) = -2048(38757A 26 + 1394199A 24 + 31894966A 22 + 243826482A 20 + 
571835031 A 18 - 326987427A 16 - 2068579420A 14 - 1054659252A 12 + 
1173787011A 10 + 1120170657A 8 + 296483526A 6 + 26886370A 4 + 
684129A 2 + 2187)A 6 /45(1 - A 2 ) 18 

tf( n ) = 8192(98142A 30 - 1899975A 28 + 92425520A 26 + 3095961215A 24 + 
25070557898A 22 + 59810870313A 20 - 11635283900A 18 - 173686662185A 16 - 
120533821070A 14 + 74948247123A 12 + 102982107048A 10 + 35567469125A 8 + 
4673872550A 6 + 217466315A 4 + 2569380A 2 + 2277)A 6 /495(1 - A 2 ) 20 
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Figure 1: Schematic drawing of message transmission through a noisy channel. 
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Figure 2: An Ising model in a random field. The solid lines represent interactions of strength J 
between neighboring spins Si while the dashed lines represent local fields Kr; acting on the spin 
Si. 
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Figure 3: Radius of convergence for the i.i.d. model (estimated and exact, see text), and HMP 
(estimated) for 0.05 < p < 0.35. 
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Figure 4: Approximation using the first eleven orders in the expansion, for e = 0.01 (left) and 
e = 0.2 (right), for various values of p. For comparison, upper and lower bounds (using N = 2 
from [11]) are displayed. For each e there is some critical p below which the series diverges and the 
approximation is poor. For larger p the approximation becomes better. 
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